Large Language Model

# Large Language Model

WeClone

WeClone is a project based on fine-tuning a large language model using WeChat chat logs, primarily used for high-quality voice cloning and digital avatars. It combines WeChat voice messages and a 0.5B large model, allowing users to interact with their digital avatars through a chatbot. This technology has significant application value in the fields of digital immortality and voice cloning, allowing users to continue communicating with others even when they are absent. This project is undergoing rapid iteration and is suitable for users interested in AI and language models. It is currently in the free development stage.

Speech and Language Processing

Dream 7B

Dream 7B is the latest diffusion large language model jointly launched by the NLP group of the University of Hong Kong and Huawei Noah's Ark Lab. It demonstrates excellent performance in text generation, especially in complex reasoning, long-term planning, and contextual coherence. The model adopts advanced training methods, possesses strong planning capabilities and flexible reasoning capabilities, and provides stronger support for various AI applications.

Argo

Xark-Argo is a desktop client product designed to help users easily build and use their own large language models. It supports multiple operating systems, including MacOS and Windows, and provides powerful local model deployment capabilities. By integrating ollama technology, users can download open-source models with one click and support large model APIs such as ChatGPT, Claude, and Siliconflow, greatly reducing the barrier to entry. This product is suitable for individual and enterprise users who need to efficiently process text and manage knowledge, and it has high flexibility and scalability. Currently, there is no clear pricing information, but its functional positioning indicates that it may be aimed at the mid-to-high-end user group.

Development & Tools

NotaGen

NotaGen is an innovative symbolic music generation model that enhances music generation quality through three stages: pre-training, fine-tuning, and reinforcement learning. Utilizing large language model technology, it can generate high-quality classical music scores, bringing new possibilities to music creation. The model's main advantages include efficient generation, diverse styles, and high-quality output. It is applicable in music creation, education, and research, with broad application prospects.

Music Generation

AoT

Atom of Thoughts (AoT) is a novel reasoning framework that transforms the reasoning process into a Markov process by representing solutions as a combination of atomic problems. This framework significantly improves the performance of large language models on reasoning tasks through decomposition and contraction mechanisms, while reducing wasted computing resources. AoT can be used as an independent reasoning method or as a plugin for existing test-time augmentation methods, flexibly combining the advantages of different methods. This framework is open-source and implemented in Python, making it suitable for researchers and developers to experiment with and apply in the fields of natural language processing and large language models.

Model Training and Deployment

Spark-TTS

Spark-TTS is a highly efficient text-to-speech synthesis model based on large language models, featuring single-stream decoupled speech tokens. Leveraging the power of large language models, it directly reconstructs audio predicted from code, omitting the additional acoustic feature generation model, thus improving efficiency and reducing complexity. This model supports zero-shot text-to-speech synthesis, enabling cross-lingual and code-switching scenarios, making it ideal for speech synthesis applications requiring high naturalness and accuracy. It also supports virtual voice creation; users can generate different voices by adjusting parameters such as gender, pitch, and speaking rate. The model aims to address the inefficiencies and complexities of traditional speech synthesis systems, providing a highly efficient, flexible, and powerful solution for research and production. Currently, the model is primarily intended for academic research and legitimate applications such as personalized speech synthesis, assistive technologies, and language research.

Level-Navi Agent-Search

Level Navi Agent Search

Level-Navi Agent is an open-source general-purpose web search agent framework that can decompose complex problems and progressively search for information on the internet until it answers user questions. By providing the Web24 dataset, covering five major fields: finance, games, sports, movies, and events, it provides a benchmark for evaluating model performance on search tasks. The framework supports zero-shot and few-shot learning, providing an important reference for the application of large language models in the field of Chinese web search agents.

M2RAG

M2RAG is a benchmark codebase for retrieval-augmented generation in multimodal contexts. It answers questions by retrieving multimodal documents, evaluating the ability of multimodal large language models (MLLMs) to leverage knowledge from multimodal contexts. The model is evaluated on tasks such as image captioning, multimodal question answering, fact verification, and image re-ranking, aiming to improve the effectiveness of models in multimodal contextual learning. M2RAG provides researchers with a standardized testing platform to help advance the development of multimodal language models.

SWE-RL

SWE-RL is a reinforcement learning-based large language model reasoning technique proposed by Facebook Research, aiming to leverage open-source software evolution data to improve model performance in software engineering tasks. This technology optimizes the model's reasoning capabilities through a rule-driven reward mechanism, enabling it to better understand and generate high-quality code. The main advantages of SWE-RL lie in its innovative reinforcement learning approach and effective utilization of open-source data, opening up new possibilities in the field of software engineering. The technology is currently in the research phase and does not yet have a defined commercial pricing, but it shows significant potential in improving development efficiency and code quality.

Coding Assistant

TableGPT2-7B

TableGPT2-7B is a large-scale decoder model developed by Zhejiang University, specifically designed for data-intensive tasks, particularly the interpretation and analysis of tabular data. Based on the Qwen2.5 architecture, it is optimized through Continuous Pretraining (CPT) and Supervised Fine-tuning (SFT) to handle complex table queries and business intelligence (BI) applications. It supports Chinese queries and is suitable for enterprises and research institutions that need to efficiently process structured data. The model is currently open-source and free; more professional versions may be released in the future.

Coding-Tutor

Coding-Tutor is a programming tutoring tool based on a large language model (LLM), designed to help learners improve their programming skills through conversational interaction. It addresses key challenges in programming tutoring through a Trace-and-Verify (Traver) workflow that combines knowledge tracing and round-by-round verification. This tool is not only suitable for programming education but also extensible to other task tutoring scenarios, helping to adjust teaching content according to the learner's knowledge level. The project is open source and supports community contributions.

Tbox - AI Powered Intelligent Agent Builder

Tbox AI Powered Intelligent Agent Builder

Tbox is a large language model-based product designed for Alipay's lifestyle ecosystem, aimed at enabling businesses to quickly build professional-grade intelligent agents and drive business growth. Integrating advanced technologies such as Ant Bailing LLM, Ant Tianjian, and Lingjing Digital Human, Tbox enables enhanced experiences and intelligent decision-making. Applicable to various industries such as public services, government affairs, travel, tourism, and healthcare, Tbox enhances user experience and operational efficiency through intelligent services. Pricing and specific positioning vary according to enterprise needs, providing customized solutions for businesses.

MoBA

MoBA (Mixture of Block Attention) is an innovative attention mechanism specifically designed for large language models dealing with long text contexts. It achieves efficient long sequence processing by dividing the context into blocks and allowing each query token to learn to focus on the most relevant blocks. MoBA's main advantage is its ability to seamlessly switch between full attention and sparse attention, ensuring performance while improving computational efficiency. This technology is suitable for tasks that require processing long texts, such as document analysis and code generation, and can significantly reduce computational costs while maintaining high model performance. The open-source implementation of MoBA provides researchers and developers with a powerful tool, driving the application of large language models in long text processing.

Model Training and Deployment

Goedel-Prover

Goedel-Prover is an open-source large language model specializing in automated theorem proving. It significantly enhances the efficiency of automated mathematical problem solving by translating natural language mathematical questions into formal languages (such as Lean 4) and generating formal proofs. The model achieved a success rate of 57.6% on the miniF2F benchmark, surpassing other open-source models. Its key advantages include high performance, open-source extensibility, and a deep understanding of mathematical problems. Goedel-Prover aims to advance automated theorem proving technologies and provide powerful tool support for mathematical research and education.

Research Instruments

OmniParser-v2.0

Omniparser V2.0

OmniParser, developed by Microsoft, is an advanced image parsing technology designed to transform irregular screenshots into structured lists of elements, including the location of interactive areas and functional descriptions of icons. It achieves efficient parsing of UI interfaces through deep learning models like YOLOv8 and Florence-2. Its main advantages lie in its efficiency, accuracy, and broad applicability. OmniParser significantly enhances the performance of user interface agents based on large language models (LLMs), enabling them to better understand and interact with various user interfaces. It performs exceptionally well in various application scenarios, such as automated testing and intelligent assistant development. OmniParser's open-source nature and flexible licensing make it a powerful tool for developers and researchers alike.

AI design tools

Mistral-Small-24B-Instruct-2501

Mistral Small 24B Instruct 2501

Mistral Small 24B is a large language model developed by the Mistral AI team, featuring 24 billion parameters and supporting multilingual conversation and instruction handling. Through instruction tuning, it generates high-quality text content applicable in various scenarios like chat, writing, and programming assistance. Its key advantages include powerful language generation capabilities, multilingual support, and efficient inference. This model caters to individuals and businesses requiring high-performance language processing, offers an open-source license, supports local deployment and quantization optimizations, making it suitable for scenarios with data privacy requirements.

MNN Large Model Android App

MNN Large Model Android App

The MNN Large Model Android App, developed by Alibaba, is an Android application based on large language models (LLMs). It supports various input and output modalities, including text generation, image recognition, and audio transcription. The app is optimized for inference performance to ensure efficient operation on mobile devices while safeguarding user data privacy, with all processing done locally. It supports a variety of leading model providers, such as Qwen, Gemma, and Llama, making it suitable for various scenarios.

Doubao-1.5-pro

Developed by the Doubao team, Doubao-1.5-pro is a high-performance sparse MoE (Mixture of Experts) large language model. This model achieves an excellent balance between model performance and inference performance through an integrated training-inference design. It excels in various public evaluation benchmarks, showcasing significant advantages in inference efficiency and multi-modal capabilities. The model is suitable for scenarios that require efficient inference and multi-modal interaction, such as natural language processing, image recognition, and speech interaction. Its technical foundation is based on the sparse activation MoE architecture, which optimizes activation parameter ratios and training algorithms to achieve higher performance leverage than traditional dense models. Additionally, it supports dynamic parameter adjustment to cater to diverse application scenarios and cost requirements.

DeepSeek-R1-Distill-Llama-70B

Deepseek R1 Distill Llama 70B

DeepSeek-R1-Distill-Llama-70B is a large language model developed by the DeepSeek team, based on the Llama-70B architecture and optimized through reinforcement learning. It excels in reasoning, dialogue, and multilingual tasks, supporting diverse applications such as code generation, mathematical reasoning, and natural language processing. Its primary advantages include efficient reasoning capabilities and problem-solving skills for complex tasks, while also supporting both open-source and commercial use. This model is suitable for enterprises and research institutions that require high-performance language generation and reasoning abilities.

InternVL2_5-78B-MPO

Internvl2 5 78B MPO

InternVL2.5-MPO is a series of multimodal large language models based on InternVL2.5 and Mixed Preference Optimization (MPO). It excels in multimodal tasks by integrating the recently incrementally pre-trained InternViT with various pre-trained large language models (LLMs) such as InternLM 2.5 and Qwen 2.5, utilizing a randomly initialized MLP projector. This model series has been trained on the multimodal reasoning preference dataset MMPR, which contains approximately 3 million samples, enhancing the model's reasoning capabilities and answer quality through an effective data construction process and mixed preference optimization techniques.

InternLM3-8B-Instruct

Internlm3 8B Instruct

Developed by the InternLM team, the InternLM3-8B-Instruct is a large language model featuring exceptional reasoning capabilities and proficiency in knowledge-intensive tasks. Despite being trained with only 40 trillion high-quality tokens, it achieves over 75% lower training costs than similar models, while outperforming models such as Llama3.1-8B and Qwen2.5-7B on multiple benchmark tests. It supports deep reasoning modes that tackle complex inference tasks, while also offering smooth user interaction capabilities. The model is open-sourced under the Apache-2.0 license, making it suitable for various applications needing efficient reasoning and knowledge processing.

Dria-Agent-a-3B

Dria Agent A 3B

Dria-Agent-a-3B is a large language model based on the Qwen2.5-Coder series, specifically tailored for agent applications. It employs a Pythonic approach to function calling, featuring advantages such as concurrent multi-function calls, free-form reasoning and actions, and instant complex solution generation. The model has excelled in various benchmarks, including the Berkeley Function Calling Leaderboard (BFCL), MMLU-Pro, and the Dria-Pythonic-Agent-Benchmark (DPAB). It has 3.09 billion parameters and supports the BF16 tensor type.

Development and Tools

Dria-Agent-a-7B

Dria Agent A 7B

Dria-Agent-a-7B is a large language model trained on the Qwen2.5-Coder series, specializing in agent applications. It utilizes a Pythonic function calling approach, offering advantages such as simultaneous multipurpose function calls, free-form reasoning and actions, and instant complex solution generation compared to traditional JSON function calls. The model has demonstrated excellent performance across various benchmarks, including the Berkeley Function Calling Leaderboard (BFCL), MMLU-Pro, and the Dria-Pythonic-Agent-Benchmark (DPAB). With 7.62 billion parameters and employing BF16 tensor type, it supports text generation tasks. Its key benefits include powerful programming assistance, efficient function calling methods, and high accuracy in specific domains. The model is suitable for applications requiring complex logic processing and multi-step task execution, such as automated programming and intelligent agents. Currently, it is available for free use on the Hugging Face platform.

Coding Assistant

Dria-Agent-α

Dria-Agent-α is a large language model (LLM) tool interaction framework introduced by Hugging Face. By using Python code to invoke tools, it fully utilizes the reasoning capabilities of LLMs, enabling the model to solve complex problems in a manner closer to human natural language compared to traditional JSON formats. This framework enhances LLM performance in agent scenarios by leveraging Python's popularity and pseudo-code-like syntax. The development of Dria-Agent-α utilized a synthetic data generation tool called Dria, which produces realistic scenarios through a multi-stage pipeline to train the model for complex problem-solving. Currently, two models, Dria-Agent-α-3B and Dria-Agent-α-7B, are available on Hugging Face.

Development & Tools

Llama-3-Patronus-Lynx-8B-Instruct-Q4_K_M-GGUF

Llama 3 Patronus Lynx 8B Instruct Q4 K M GGUF

This model is a quantized large language model that utilizes 4-bit quantization technology to reduce storage and computational requirements. With 8.03 billion parameters, it is free for non-commercial use and ideal for high-performance language applications in resource-constrained environments.

InternVL2_5-38B-MPO

Internvl2 5 38B MPO

InternVL2.5-MPO is an advanced series of large multimodal language models built on InternVL2.5 and Mixed Preference Optimization (MPO). This series excels in multimodal tasks, capable of processing image, text, and video data while generating high-quality text responses. The model employs a 'ViT-MLP-LLM' paradigm, optimizing visual processing capabilities through pixel unshuffle operations and dynamic resolution strategies. Furthermore, it supports multiple images and video data, further expanding its application scenarios. In multimodal capability assessments, InternVL2.5-MPO surpasses numerous benchmark models, affirming its leadership in the multimodal field.

Agent Laboratory

Agent Laboratory

Agent Laboratory is a project developed by Samuel Schmidgall and others, aiming to support researchers throughout the entire research process—from literature review to experimental execution to report writing—through dedicated agents driven by large language models. It is not intended to replace human creativity, but rather to complement it, allowing researchers to concentrate on conceptualization and critical thinking while automating repetitive and time-consuming tasks such as coding and documentation. The tool's source code is licensed under the MIT License, permitting the use, modification, and distribution of the code provided that the terms of the MIT License are followed.

Research Instruments

InternVL2_5-26B-MPO-AWQ

Internvl2 5 26B MPO AWQ

InternVL2_5-26B-MPO-AWQ is a multimodal large language model developed by OpenGVLab, designed to enhance model reasoning capabilities through hybrid preference optimization. This model excels in multimodal tasks, effectively managing the complex relationships between images and text. It utilizes cutting-edge model architecture and optimization techniques, providing significant advantages in handling multimodal data. The model is ideal for scenarios requiring efficient processing and understanding of multimodal data, such as image description generation and multimodal question answering. Its main advantages include powerful reasoning capabilities and an efficient model architecture.

AnyParser Pro

AnyParser Pro is an innovative document parsing tool developed by CambioML. Utilizing large language model (LLM) technology, it quickly and accurately extracts complete textual content from PDF, PPT, and image files. The main advantages of this technology lie in its efficient processing speed and high precision in parsing, significantly enhancing document processing efficiency. Background information indicates that it was launched by CambioML, a startup incubated by Y Combinator, aimed at providing users with a simple, user-friendly, and powerful document parsing solution. Currently, the product offers a free trial, and users can access its features by obtaining an API key.

VITA-1.5

VITA-1.5 is an open-source multimodal large language model designed to enable near real-time visual and speech interaction. It significantly reduces interaction latency and enhances multimodal performance, providing users with a smoother interaction experience. The model supports both English and Chinese and is applicable to various scenarios, including image recognition, speech recognition, and natural language processing. Its key advantages include efficient speech processing capabilities and robust multimodal understanding.

Featured AI Tools

Flow AI

Flow is an AI-driven movie-making tool designed for creators, utilizing Google DeepMind's advanced models to allow users to easily create excellent movie clips, scenes, and stories. The tool provides a seamless creative experience, supporting user-defined assets or generating content within Flow. In terms of pricing, the Google AI Pro and Google AI Ultra plans offer different functionalities suitable for various user needs.

Video Production

NoCode

NoCode is a platform that requires no programming experience, allowing users to quickly generate applications by describing their ideas in natural language, aiming to lower development barriers so more people can realize their ideas. The platform provides real-time previews and one-click deployment features, making it very suitable for non-technical users to turn their ideas into reality.

Development Platform

ListenHub

ListenHub is a lightweight AI podcast generation tool that supports both Chinese and English. Based on cutting-edge AI technology, it can quickly generate podcast content of interest to users. Its main advantages include natural dialogue and ultra-realistic voice effects, allowing users to enjoy high-quality auditory experiences anytime and anywhere. ListenHub not only improves the speed of content generation but also offers compatibility with mobile devices, making it convenient for users to use in different settings. The product is positioned as an efficient information acquisition tool, suitable for the needs of a wide range of listeners.

MiniMax Agent

MiniMax Agent is an intelligent AI companion that adopts the latest multimodal technology. The MCP multi-agent collaboration enables AI teams to efficiently solve complex problems. It provides features such as instant answers, visual analysis, and voice interaction, which can increase productivity by 10 times.

Multimodal technology

Tencent Hunyuan Image 2.0

Tencent Hunyuan Image 2.0

Tencent Hunyuan Image 2.0 is Tencent's latest released AI image generation model, significantly improving generation speed and image quality. With a super-high compression ratio codec and new diffusion architecture, image generation speed can reach milliseconds, avoiding the waiting time of traditional generation. At the same time, the model improves the realism and detail representation of images through the combination of reinforcement learning algorithms and human aesthetic knowledge, suitable for professional users such as designers and creators.

Image Generation

OpenMemory MCP

OpenMemory is an open-source personal memory layer that provides private, portable memory management for large language models (LLMs). It ensures users have full control over their data, maintaining its security when building AI applications. This project supports Docker, Python, and Node.js, making it suitable for developers seeking personalized AI experiences. OpenMemory is particularly suited for users who wish to use AI without revealing personal information.

FastVLM

FastVLM is an efficient visual encoding model designed specifically for visual language models. It uses the innovative FastViTHD hybrid visual encoder to reduce the time required for encoding high-resolution images and the number of output tokens, resulting in excellent performance in both speed and accuracy. FastVLM is primarily positioned to provide developers with powerful visual language processing capabilities, applicable to various scenarios, particularly performing excellently on mobile devices that require rapid response.

Image Processing

LiblibAI

LiblibAI is a leading Chinese AI creative platform offering powerful AI creative tools to help creators bring their imagination to life. The platform provides a vast library of free AI creative models, allowing users to search and utilize these models for image, text, and audio creations. Users can also train their own AI models on the platform. Focused on the diverse needs of creators, LiblibAI is committed to creating inclusive conditions and serving the creative industry, ensuring that everyone can enjoy the joy of creation.

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase